Goto

Collaborating Authors

 Central North Sea




BEAVER: An Efficient Deterministic LLM Verifier

Suresh, Tarun, Wadhwa, Nalin, Banerjee, Debangshu, Singh, Gagandeep

arXiv.org Artificial Intelligence

As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel token trie and frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on correctness verification, privacy verification and secure code generation tasks across multiple state of the art LLMs. BEAVER achieves 6 to 8 times tighter probability bounds and identifies 3 to 4 times more high risk instances compared to baseline methods under identical computational budgets, enabling precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.


Automating the Refinement of Reinforcement Learning Specifications

Ambadkar, Tanmay, Žikelić, Đorđe, Verma, Abhinav

arXiv.org Artificial Intelligence

Logical specifications have been shown to help reinforcement learning algorithms in achieving complex tasks. However, when a task is under-specified, agents might fail to learn useful policies. In this work, we explore the possibility of improving coarse-grained logical specifications via an exploration-guided strategy. We propose \textsc{AutoSpec}, a framework that searches for a logical specification refinement whose satisfaction implies satisfaction of the original specification, but which provides additional guidance therefore making it easier for reinforcement learning algorithms to learn useful policies. \textsc{AutoSpec} is applicable to reinforcement learning tasks specified via the SpectRL specification logic. We exploit the compositional nature of specifications written in SpectRL, and design four refinement procedures that modify the abstract graph of the specification by either refining its existing edge specifications or by introducing new edge specifications. We prove that all four procedures maintain specification soundness, i.e. any trajectory satisfying the refined specification also satisfies the original. We then show how \textsc{AutoSpec} can be integrated with existing reinforcement learning algorithms for learning policies from logical specifications. Our experiments demonstrate that \textsc{AutoSpec} yields promising improvements in terms of the complexity of control tasks that can be solved, when refined logical specifications produced by \textsc{AutoSpec} are utilized.


Space Explanations of Neural Network Classification

Labbaf, Faezeh, Kolárik, Tomáš, Blicha, Martin, Fedyukovich, Grigory, Wand, Michael, Sharygina, Natasha

arXiv.org Artificial Intelligence

Explainability of decision-making AI systems (XAI), and specifically neural networks (NNs), is a key requirement for deploying AI in sensitive areas [18]. A recent trend in explaining NNs is based on formal methods and logic, providing explanations for the decisions of machine learning systems [24, 31, 32, 41, 42, 44] accompanied by provable guarantees regarding their correctness. Yet, rigorous exploration of the continuous feature space requires to estimate decision boundaries with complex shapes. This, however, remains a challenge because existing explanations [24, 31, 32, 41, 42, 44] constrain only individual features and hence fail capturing relationships among the features that are essential to understand the reasons behind the multi-parametrized classification process. We address the need to provide interpretations of NN systems that are as meaningful as possible using a novel concept of Space Explanations, delivered by a flexible symbolic reasoning framework where Craig interpolation [12] is at the heart of the machinery.


A Problem-Oriented Taxonomy of Evaluation Metrics for Time Series Anomaly Detection

Yang, Kaixiang, Liu, Jiarong, Song, Yupeng, Yang, Shuanghua, Zhou, Yujue

arXiv.org Machine Learning

Abstract--Time series anomaly detection is widely used in IoT and cyber-physical systems, yet its evaluation remains challenging due to diverse application objectives and heterogeneous metric assumptions. This study introduces a problem-oriented framework that reinterprets existing metrics based on the specific evaluation challenges they are designed to address, rather than their mathematical forms or output structures. We categorize over twenty commonly used metrics into six dimensions: (1) basic accuracy-driven evaluation, (2) timeliness-aware reward mechanisms, (3) tolerance to labeling imprecision, (4) penalties reflecting human-audit cost, (5) robustness against random or inflated scores, and (6) parameter-free comparability for cross-dataset benchmark-ing. Comprehensive experiments are conducted to examine metric behavior under genuine, random, and oracle detection scenarios. By comparing their resulting score distributions, we quantify each metric's discriminative ability--its capability to distinguish meaningful detections from random noise. The results show that while most event-level metrics exhibit strong separability, several widely used metrics (e.g., NAB, Point-Adjust) demonstrate limited resistance to random-score inflation. These findings reveal that metric suitability must be inherently task-dependent and aligned with the operational objectives of IoT applications. The proposed framework offers a unified analytical perspective for understanding existing metrics and provides practical guidance for selecting or developing more context-aware, robust, and fair evaluation methodologies for time series anomaly detection. He emergence of the Internet of Things (IoT) has accelerated digital transformation across numerous domains. Its defining characteristic lies in the large-scale deployment of intelligent and heterogeneous devices--such as sensors, actuators, and RFID systems--that are interconnected via the Internet to enable autonomous communication without human intervention [1]. Currently, more than 12 billion IoT devices are in operation, and this number is projected to reach 125 billion by 2030 [2]. Consequently, the volume of data generated by these devices continues to soar, with an expected total of 79.4 ZB by 2025 [3]. In industrial contexts, the integration of IoT technologies has driven the ongoing Industry 4.0 revolution, emphasizing connectivity, automation, and intelligence. Kaixiang Y ang, Jiarong Liu, Y upeng Song, and Y ujue Zhou are with the School of Artificial Intelligence, Y unnan University, Kunming 650091, China. Shuanghua Y ang is with Beijing Normal University - Hong Kong Baptist University, Zhuhai 519087, China. This work was supported in part by the Y unnan Fundamental Research Projects under Grant 202401AU070151, and in part by the Y unnan Provincial Science and Technology Talent and Platform Plan under Grant 202505AF350053.


Extracting Robust Register Automata from Neural Networks over Data Sequences

Hong, Chih-Duo, Jiang, Hongjian, Lin, Anthony W., Markgraf, Oliver, Parsert, Julian, Tan, Tony

arXiv.org Artificial Intelligence

Automata extraction is a method for synthesising interpretable surrogates for black-box neural models that can be analysed symbolically. Existing techniques assume a finite input alphabet, and thus are not directly applicable to data sequences drawn from continuous domains. We address this challenge with deterministic register automata (DRAs), which extend finite automata with registers that store and compare numeric values. Our main contribution is a framework for robust DRA extraction from black-box models: we develop a polynomial-time robustness checker for DRAs with a fixed number of registers, and combine it with passive and active automata learning algorithms. This combination yields surrogate DRAs with statistical robustness and equivalence guarantees. As a key application, we use the extracted automata to assess the robustness of neural networks: for a given sequence and distance metric, the DRA either certifies local robustness or produces a concrete counterexample. Experiments on recurrent neural networks and transformer architectures show that our framework reliably learns accurate automata and enables principled robustness evaluation. Overall, our results demonstrate that robust DRA extraction effectively bridges neural network interpretability and formal reasoning without requiring white-box access to the underlying network.